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ORIGINAL ARTICLE 

Genetic risk prediction and neurobiological understanding of 
alcoholism 

DF Levey\ H Le-Niculescu\ J Frank^ M Ayalew\ N Jain\ B Kirlin\ R Learman\ E Winiger\ Z Rodd\ A Shekhar\ N Schork^ F Kiefe^ 
N Wodarz^ B Muiier-Myhsok^ N Dahmen^ GESGA Consortium, M Nothen^ R Sherva^ L Farrer^ AH Smith^° HR Kranzler", 
M RietscheP, J Gelernter^° and AB Niculescu^'^^ 

We have used a translational Convergent Functional Genomics (CFG) approach to discover genes involved in alcoholism, by gene- 
level integration of genome-wide association study (GWAS) data from a German alcohol dependence cohort with other genetic and 
gene expression data, from human and animal model studies, similar to our previous work in bipolar disorder and schizophrenia. A 
panel of all the nominally significant P-value SNPs in the top candidate genes discovered by CFG (n = 135 genes, 713 SNPs) was 
used to generate a genetic risk prediction score (GRPS), which showed a trend towards significance (P = 0.053) in separating 
alcohol dependent individuals from controls in an independent German test cohort. We then validated and prioritized our top 
findings from this discovery work, and subsequently tested them in three independent cohorts, from two continents. A panel of all 
the nominally significant P-value single-nucleotide length polymorphisms (SNPs) in the top candidate genes discovered by CFG 
(n = 135 genes, 713 SNPs) were used to generate a Genetic Risk Prediction Score (GRPS), which showed a trend towards significance 
(P = 0.053) in separating alcohol-dependent individuals from controls in an independent German test cohort. In order to validate 
and prioritize the key genes that drive behavior without some of the pleiotropic environmental confounds present in humans, we 
used a stress-reactive animal model of alcoholism developed by our group, the D-box binding protein (DBP) knockout mouse, 
consistent with the surfeit of stress theory of addiction proposed by Koob and colleagues. A much smaller panel (n = 1 1 genes, 66 
SNPs) of the top CFG-discovered genes for alcoholism, cross-validated and prioritized by this stress-reactive animal model showed 
better predictive ability in the independent German test cohort (P = 0.041). The top CFG scoring gene for alcoholism from the initial 
discovery step, synuclein alpha (SNCA) remained the top gene after the stress-reactive animal model cross-validation. We also 
tested this small panel of genes in two other independent test cohorts from the United States, one with alcohol dependence 
(P = 0.0001 2) and one with alcohol abuse (a less severe form of alcoholism; P = 0.0094). SNCA by itself was able to separate 
alcoholics from controls in the alcohol-dependent cohort (P = 0.00001 3) and the alcohol abuse cohort (P = 0.023). So did eight other 
genes from the panel of 1 1 genes taken individually, albeit to a lesser extent and/or less broadly across cohorts. SNCA, GRI\/13 and 
MBP survived strict Bonferroni correction for multiple comparisons. Taken together, these results suggest that our stress-reactive 
DBP animal model helped to validate and prioritize from the CFG-discovered genes some of the key behaviorally relevant genes for 
alcoholism. These genes fall into a series of biological pathways involved in signal transduction, transmission of nerve impulse 
(including myelination) and cocaine addiction. Overall, our work provides leads towards a better understanding of illness, 
diagnostics and therapeutics, including treatment with omega-3 fatty acids. We also examined the overlap between the top 
candidate genes for alcoholism from this work and the top candidate genes for bipolar disorder, schizophrenia, anxiety from 
previous CFG analyses conducted by us, as well as cross-tested genetic risk predictions. This revealed the significant genetic overlap 
with other major psychiatric disorder domains, providing a basis for comorbidity and dual diagnosis, and placing alcohol use in the 
broader context of modulating the mental landscape. 
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INTRODUCTION 

Alcohol use and overuse (alcoholism) have deep historical and 
'Drunkenness is nothing but voluntary madness.' cultural roots, as well as important medical and societal 

— Seneca consequences.^ Whereas there is evidence for roles for both 
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genes and environment in alcoholism, a comprehensive biological 
understanding of the disorder has been elusive so far, despite 
extensive work in the field. Most notably, there has been until 
recently insufficient translational integration across functional 
and genetic studies, and across human and animal model 
studies, resulting in missed opportunities for a comprehensive 
understanding. 

As part of a translational Convergent Functional Genomics 
(CFG) approach, developed by us over the last 15 years,^ and 
expanding upon our earlier work on identifying genes for 
alcoholism,^"^ we set out to comprehensively identify candidate 
genes, pathways and mechanisms for alcoholism, integrating the 
available evidence in the field to date. We have used data from a 
published German genome-wide association study for 
alcoholism.^ We integrated those data in a Bayesian-like manner 
with other human genetic data (association or linkage) for 
alcoholism, as well as human gene expression data, post- 
mortem brain gene expression data and peripheral (blood and 
cell culture) gene expression data. We also used relevant animal 
model genetic data (transgenic and quantitative trait loci (QTL)), as 
well as animal model gene expression data (brain and blood) 
generated by our group and others (Figures 1 and 2). Human data 
provide specificity for the illness, and animal model data provide 
sensitivity of detection. Together, they helped to identify and 
prioritize candidate genes for the illness using a polyevidence CFG 
score, resulting in essence in a de facto field-wide integration 
putting together all the available lines of evidence to date. Once 
that is done, biological pathway analyses can be conducted and 
mechanistic models can be constructed. 

An obvious next step is developing a way of applying that 
knowledge to genetic testing of individuals to determine risk for 
the disorder. On the basis of our comprehensive identification of 
top candidate genes described in this paper, we have chosen all 
the nominally significant P-value SNPs corresponding to each of 
those 135 genes from the GWAS data set used for discovery (top 
candidate genes prioritized by CFG with the score of 8 and above 
(>50% maximum possible CFG score of 16) and assembled a 
Genetic Risk Prediction panel out of those 713 SNPs. We then 
developed a Genetic Risk Prediction Score (GRPS) for alcoholism 
based on the presence or absence of the alleles of the SNPs 
associated with the illness from the discovery GWAS, and tested 
the GRPS in an independent German cohort,^^ to see whether it 




Figure 2. Top candidate genes for alcoholism. 



can differentiate alcohol-dependent subjects from controls, 
observing a trend towards significance. 

In order to validate and prioritize genes in this panel using a 
behavioral prism, we then looked at the overlap between our 
panel of 135 top candidate genes and genes changed in 
expression in a stress-reactive animal model for alcoholism 
developed by our group, the DBP knockout mouse."^'^ We used 
this overlap to reduce our panel to 1 1 genes (66 SNPs). 

This small panel of 1 1 genes was subsequently tested and 
shown to be able to differentiate between alcoholics and controls 
in the three independent test cohorts, one German^^ and two US- 
based,^^ suggesting that the animal model served in essence as a 
filter to identify from the larger list of CFG-prioritized genes the 
key behaviorally relevant genes. Our results indicate that panels of 
SNPs in top genes identified and prioritized by CFG analysis and 
by a behaviorally relevant animal model can differentiate between 
alcoholics and controls at a population level (Figure 3), although at 
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Figure 1. Convergent Functional Genomics. 
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Figure 3. Genetic Risk Prediction using a panel of top candidate genes for alcoholisnn (GRPS-11). Testing in independent cohorts 3 and 4. 
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Figure 4. Overlap of alcoholism versus other major psychiatric 
disorders. Top candidate genes for alcoholism identified by CFG 
(n = 135) in the current study versus top candidate genes for other 
psychiatric disorders and a stress-driven animal model of alcoholism 
(DBP knockout mouse) from our previous work. 
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Figure 5. Mindscape (mental landscape)-dimensional view of genes 
that may be involved in alcoholism and other major psychiatric 
disorders. 



an individual level the margin may be small (Supplementary 
Figure S2). The latter point suggests that, similar to bipolar 
disorder^^ and schizophrenia,^"^ the contextual cumulative combi- 
natorics of common gene variants and environment^^ has a major 
role in risk for illness. 

Lastly, we have looked at overlap with genes for other major 
psychiatric disorder domains (bipolar disorders, anxiety disorders, 
schizophrenias) from our previous studies and provide evidence 



for shared genes (Figures 4 and 5) as well as shared genetic risk 
(Figure 6). 

Overall, this work sheds light on the genetic architecture and 
pathophysiology of alcoholism, provides mechanistic targets for 
therapeutic intervention and has implications for genetic testing 
to assess risk for illness before the illness manifests itself clinically, 
opening the door for enhanced prevention strategies at a young 
age. As alcoholism is a disease that does not exist if the exogenous 
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Figure 6. Genetic load for bipolar disorder and schizophrenia in 
alcoholisnn. A total of 34 out of 66 SNPs in our alcohol GRPS-1 1 panel 
(current work; in n = 10 genes), 42 out of 224 SNPs in our bipolar 
GRPS^^ (in n = 34 qenes) and 151 out of 542 SNPs in our 
schizophrenia GRPS (in n = 35 genes) were present and tested in 
the alcohol cohorts 3 and 4. See also Supplennentary Table S7. 
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agent (alcohol) is not consumed, the use of genetic information to 
inform lifestyle choices could be quite powerful. 



MATERIALS AND METHODS 

Human subject cohorts 

Discovery cohort (cohort 1): GWAS for alcohol dependence from Germany. 
Data for the discovery CFG work (Cohort 1) were obtained from a GWA 
study of self-reported German descent subjects, consisting of 41 1 alcohol- 
dependent male subjects and 1307 population-based controls (663 male 
and 644 female subjects).^ Individuals were genotyped using HumanHap 
550 BeadChips (lllumina Inc, San Diego, CA, USA). SNPs with a nominal 
allelic P-value < 0.05 were selected for analysis. No Bonferroni correction 
was performed. 

Test cohort 2 (alcohol dependence, Germany). An independent test cohort 
of German descent^^ consisting of 740 alcohol-dependent male subjects 
and 861 controls (276 male and 585 female subjects) was used for testing 
the results of the discovery analyses. Individuals were genotyped using 
lllumina Human610Quad or lllumina Human660w Quad BeadChips 
(lllumina Inc). The controls were genotyped using lllumina HumanHap550 
Bead Chips. 

Test cohort 3 (alcohol dependence, United States) and test cohort 4 (alcohol 
abuse. United States). The sample consisted of small nuclear families 
originally collected for linkage studies, and unrelated individuals, 
Caucasians and African-American, male and female subjects. The subjects 
were recruited at five US clinical sites: Yale University School of Medicine 
(APT Foundation; New Haven, CT, USA), the University of Connecticut 
Health Center (Farmington, CT, USA), the University of Pennsylvania 
Perelman School of Medicine (Philadelphia, PA, USA), the Medical 
University of South Carolina (Charleston, SC, USA) and McLean Hospital 
(Belmont, MA, USA). All subjects were interviewed using the Semi- 
Structured Assessment for Drug Dependence and Alcoholism to derive 
diagnoses for lifetime alcohol dependence, alcohol abuse and other major 
psychiatric traits according to the DSM-IV criteria. There were 1687 male 
subjects with alcohol dependence, 366 male subjects with alcohol abuse 
and 475 male controls. There were 1081 female subjects with alcohol 
dependence, 234 female subjects with alcohol abuse and 786 female 
controls (Table 1). Individuals were genotyped on the lllumina 
HumanOmnil-Quad vl.O microarray (988 306 autosomal SNPs). GWAS 
genotyping was conducted at the Yale Center for Genome Analysis and 
the Center for Inherited Disease Research. Genotypes were called using the 
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GenomeStudio software V2011.1 and genotyping module version 1.8.4 
(lllumina Inc)." 

Gene identification in discovery cohort 1 

Quality control. Genotype data had been filtered using stringent quality- 
control criteria as described earlier^^ and accounted for call rate, 
population substructure, cryptic relatedness, minor allele frequency and 
batch effects. 

Association test in discovery sample. Association testing was performed 
using PLINK 1.07 (http://pngu.mgh.harvard.edu/ ~ purcell)^^ software 
package. A logistic regression modelling approach was applied to correct 
for population stratification. Therefore, principal component analysis was 
conducted considering only independent autosomal SNPs with minor 
allele frequency >0.05 and pairwise < 0.05 within a 200-SNP window. LD 
filtering resulted in a set of 28 505 SNPs used for principal component 
analysis, which was carried out using GCTA 1.04 (http://www.complex- 
traitgenomics.com/software/gcta/).^^ The first two principal components 
resulting from this analysis were included as covariates in the logistic 
regression model. 

Assignment of SNPs to genes. Genes corresponding to SNPs were 
identified initially using the annotation file from the lllumina website 
(http://www.illumina.com, HumanHAP550v3_Gene_Annotation). Next, 
genes were cross-checked with GeneCards (http://www.genecards.org) to 
ensure that each gene symbol was current. Any gene symbol that matched 
to a different gene symbol in Gene Cards was checked to verify 
chromosome number and location match with the original gene, and 
was replaced with the current GeneCards gene symbol. SNPs from the 
original annotation files that had no gene matches in the annotation file 
and UCSC Genome Browser (that is, not falling within an exon or intron of 
a known gene) were assumed to regulate and thus implicate the gene 
closest to the SNP location, using the refSNP database from NCBI (http:// 
www.ncbi.nlm.nih.gov/snp/7SITE = NcbiHome&submit = Go). 

Convergent functional genomic analyses 

Databases. We have established in our laboratory (Laboratory of 
Neurophenomics, Indiana University School of Medicine, www.neurophe- 
nomics.info) manually curated databases of all the human gene expression 
(post-mortem brain, blood and cell cultures), human genetic (association, 
copy number variants (CNVs) and linkage), animal model genetic and 
animal model gene expression studies published to date on psychiatric 
disorders. Only the findings deemed significant in the primary publication, 
by the study authors, using their particular experimental design and 
thresholds, are included in our databases. Our databases include only 
primary literature data and do not include review papers or other 
secondary data integration analyses to avoid redundancy and circularity. 
These large and constantly updated databases have been used in our CFG 
cross-validation and prioritization (Figure 1). 

Human post-mortem brain, blood and other peripheral tissue gene expression 
evidence. Information about genes was obtained and imported in our 
databases searching the primary literature with PubMed (http://ncbi.nlm. 
nih.gov/PubMed), using various combinations of keywords. For this work, 
the keywords were as follows: alcohol, alcoholism, human, brain, 
postmortem, lymphocytes, blood, cells and gene expression. 

Human genetic evidence (association, linkage). To designate convergence 
for a particular gene, the gene had to have independent published 
evidence of association or linkage for alcoholism. We sought to avoid using 
any association studies that included subjects who were also included in 
our discovery or test cohorts. For linkage, the location of each gene was 
obtained through GeneCards (http://www.genecards.org), and the sex- 
averaged cM location of the start of the gene was then obtained through 
http://compgen.rutgers.edu/old/map-interpolator/. For linkage conver- 
gence, per our previously published criteria, the start of the gene had to 
map within 5 cM of the location of a marker linked to the disorder with a 
lod score of > 2. 

Animal model brain and blood gene expression evidence. For animal model 
brain and blood gene expression evidence, we have used our own rat 



model data sets,^ as well as published reports from the literature curated in 
our databases. 

The rat animal model experimental work from our group was previously 
described.^ The experimental approaches used to produce the animal 
model data for CFG analysis were carried out in two rat lines selectively 
bred for divergent alcohol preference: inbred alcohol-preferring (iP) versus 
inbred alcohol-non-preferring (iNP) rats. Following five brain regions were 
chosen for gene expression studies in these rat lines: the frontal cortex, 
amygdala, caudate-putamen, nucleus accumbens and hippocampus. 
Animal studies, as well as human imaging and post-mortem analyses, 
had previously provided evidence that these regions are implicated in 
alcoholism. 

Data for the analysis came from studies of three experimental 
paradigms. Paradigm 1 examined basal level of gene expression in the 
brains of the alcohol-naive iP and iNP lines of rats. This basal comparison 
was performed to determine innate differences between these two lines 
with a marked divergence in the willingness to consume alcohol. We 
hypothesized that the innate differences in gene expression between the 
iP and iNP would involve some of the genes associated with an increased 
susceptibility for alcohol dependence. Paradigm 2 examined the effects of 
chronic 24-h free-choice alcohol consumption on gene expression in iP rats 
compared with alcohol-naive iP rats. This paradigm looked for gene 
expression changes in the brain associated with the direct influence of 
peripherally self-administered alcohol in the genetically susceptible rats. In 
Paradigm 3, iP rats were allowed to self-infuse alcohol directly into the 
posterior ventral tegmental area, the originating area of the mesolimbic 
dopamine system. The advantage of this latter procedure is that it isolates 
the neurocircuitry involved in alcohol reinforcement, and eliminates the 
peripheral effects of alcohol. Following the establishment of alcohol self- 
administration into the posterior VTA, gene expression levels in target 
brain areas were measured and compared with P rats that received 
artificial cerebral spinal fluid infusions into the posterior VTA. 

Animal model genetic evidence. To search for mouse genetic evidence 
(transgenic and QTL) for our candidate genes, we utilized PubMed as well 
as the Mouse Genome Informatics (http://www.informatics.jax.org; Jackson 
Laboratory, Bar Harbor, ME, USA) database, and used the search 'Genes 
and Markers' form to find transgenic in categories for abnormal alcohol 
consumption, alcohol preference, alcohol aversion, impaired behavioral 
response to alcohol, hyperactivity elicited by ethanol administration and 
enhanced behavioral response to alcohol. For QTL convergence, the start 
of the gene had to map within 5 cM of the location of these markers. 

CFG scoring. We used a nominal P-value threshold (having at least one 
SNP with P<0.05) for including genes from the discovery GWAS in the 
CFG analysis. No Bonferroni correction was performed. 
Internal score: For each of these genes implicated by SNPs, we 
calculated the percent of SNPs that were nominally significant (ratio of 
number of nominally significant SNPs over total number of SNPs tested for 
that gene, multiplied by 100), obtaining a distribution of values. The genes 
in the top 0.1% of the distribution were given an internal score of 4 points, 
those in the top 5% of the distribution were given 3 points and the 
remaining genes all received 2 points. The internal score provides a 
prioritization of genes based on GWAS results and might prioritize genes 
that have higher biological relevance and heterogeneity. 
External score: Human and animal model data, genetic and gene 
expression were integrated and tabulated, resulting in a polyevidence CFG 
score. All six cross-validating lines of evidence (human data and animal 
model data) were weighted such that evidence from human studies was 
prioritized 2x over evidence from animal models, gene expression 
evidence was prioritized 2x over genetic evidence and brain evidence 
was prioritized 2x over peripheral tissue evidence (Figure 1). For human 
genetic evidence, 2 points were assigned if it was from association and 1 
point if it was from linkage studies. For animal model genetic evidence, 2 
points if it was from transgenic and 1 point if it was from QTL. The 
maximum possible external score for each gene is 12. 

We have capped (one positive study scores maximum points) the 
hypothesis-driven candidate gene genetic association evidence and 
animal model genetic (transgenic) lines of evidence, regardless of how 
many other such studies support that gene, to avoid potential 'popularity' 
biases, where some genes are more studied than others. For discovery- 
driven gene expression studies, we have capped (one positive study scores 
maximum points) the human post-mortem brain work because of the 
paucity of brain collections and the fact that such studies often use the 
same brain bank sources. However, we have not similarly capped the 
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animal model brain and blood gene expression evidence, as such studies 
are not only discovery-based, but use independent cohorts of animals. 
These were scored differentially, based on the number of studies showing 
evidence for a given gene: three or more different studies received full 
maximum points, two studies 0.75 of maximum points and one study 0.5 
of the maximum points. Our group generated data sets for three 
independent animal studies for this analysis (see above). 

The more lines of evidence for a gene — that is, the more times a gene 
shows up as a positive finding across independent studies, platforms, 
methodologies and species— the higher its external CFG score (Figure 1). 
This is similar conceptually to the Google PageRank algorithm, in which the 
more links to a page, the higher it comes up on the search prioritization 
list. It has not escaped our attention that other ways of weighing the lines 
of evidence may give slightly different results in terms of prioritization, if 
not in terms of the list of genes per se. Nevertheless, we think this simple 
scoring system provides a good separation of genes, with specificity 
provided by human data and sensitivity provided by animal model data. 

Prioritizing top olcotioiism candidate genes that overlap witti a stress-reactive 
animal model of alcoholism. Stress has been proposed as a driver of 
alcoholism, notably by Koob and colleagues,^^'^^ as well as by Heilig and 
colleagues.^° We have previously identified the circadian clock gene DBP 
as a candidate gene for bipolar disorder,^^ as well as for alcoholism,^ using 
a CFG approach. In follow-up work, we established mice with a 
homozygous deletion of DBP (DBP KO) as a stress-reactive genetic animal 
model of bipolar disorder and alcoholism.^ We reported that DBP KO mice 
have lower locomotor activity, blunted responses to stimulants and gain 
less weight over time. In response to a stress paradigm that translationally 
mimics what can happen in humans (chronic stress-isolation housing for 
4 weeks, with acute stress, on top of that- experimental handling in week 
3), the mice exhibit a diametric switch in these phenotypes. DBP KO mice 
are also activated by sleep deprivation, similar to bipolar patients, and that 
activation is prevented by treatment with the mood stabilizer drug 
valproate. Moreover, these mice show increased alcohol intake following 
exposure to stress. Microarray studies of brain and blood revealed a 
pattern of gene expression changes that may explain the observed 
phenotypes. CFG analysis of the gene expression changes identified a 
series of candidate genes and blood biomarkers for bipolar disorder, 
alcoholism and stress reactivity. Subsequent studies by us showed that 
treatment with the omega-3 fatty acid docosahexaenoic acid (DHA) 
normalized the gene expression (brain and blood) and behavioral 
phenotypes of this mouse model, including reducing alcohol 
consumption.^ 

We examined the overlap between the top candidate genes for 
alcoholism from the current analysis and the top candidate genes from the 
DBP KO stress mice, thus reducing the list from 135 to 11 (Figure 4). 

Pathway analyses. IPA 9.0 (Ingenuity Systems, www.ingenuity.com. Red- 
wood City, CA, USA) was used to analyze the biological roles, including top 
canonical pathways and diseases, of the candidate genes resulting from 
our work (Table 2 and Supplementary Table S2), as well as used to identify 
genes in our data sets that are the targets of existing drugs 
(Supplementary Table S3). Pathways were identified from the IPA library 
of canonical pathways that were most significantly associated with genes 
in our data set. The significance of the association between the data set 
and the canonical pathway was measured in 2 ways: (1) a ratio of the 
number of molecules from the data set that map to the pathway divided 
by the total number of molecules that map to the canonical pathway is 
displayed; (2) Fisher's exact test was used to calculate a P-value 
determining the probability that the association between the genes in 
the data set and the canonical pathway is explained by chance alone. We 
also conducted a KEGG pathway analysis through the Partek Genomic 
Suites 6.6 software package, Partek Inc, Saint Louis, MO, USA), and GeneGo 
MetaCore from Thomson Reuters, New York, NY, USA) pathway analyses 
(https://portal.genego.com/). 

Epistasis testing. The test cohort 2 data were used to test for epistatic 
interactions among the best P-value SNPs in the 1 1 top candidate genes 
from our work. SNP-SNP allelic epistasis was tested for each distinct pair of 



SNPs between genes, using the PLINK software package (Supplementary 
Table S5). 

Genetic risk prediction 

The software package PLINK 1.07 (http://pngu.mgh.harvard.edu/~pur- 
cell)^^ was used to extract individual genotype information for each 
subject from the test cohorts 2, 3 and 4 data files. 

As we had previously performed for bipolar disorder and schizophrenia, 
we developed a polygenic GRPS for alcoholism based on the presence or 
absence of the alleles of the SNPs associated with illness in the discovery 
GWAS cohort 1, and tested the GRPS in three independent cohorts, from 
different geographic areas, ethnicities and different types of alcoholism. 
We tested two panels: a larger panel containing all the nominally 
significant SNPs in top CFG scoring candidate genes (n = 135) from the 
discovery GWAS1 in the top CFG-prioritized genes (Supplementary Tables 
SI and S4) and a smaller one (n = 11) containing genes out of the larger 
panel that were cross-validated using an animal model of alcoholism. 

Of note, our genes, SNP panels and choice of affected alleles were based 
solely on analysis of the discovery GWAS1, which is our discovery cohort, 
completely independently from the test cohorts. Each SNP has two alleles 
(represented by base letters at that position). One of them is associated 
with the illness (affected), the other not (non-affected), based on the odds 
ratios from the discovery GWAS1 . We assigned the affected allele a score of 
1 and the non-affected allele a score of 0. A two-dimensional matrix of 
subjects by GRP panel alleles is generated, with the cells populated by 0 or 
1. A SNP in a particular individual subject can have any permutation of 1 
and 0 (1 and 1, 0 and 1, 0 and 0). By adding these numbers, the minimum 
score for a SNP in an individual subject is 0, and the maximum score is 2. 
By adding the scores for all the alleles in the panel, averaging that and 
multiplying by 100, we generated for each subject an average score 
corresponding to a genetic loading for disease, which we call Genetic Risk 
Predictive Score.^^'^^ 

To test for significance, a one-tailed f-test with unequal variance was 
performed between the alcoholic subjects and the control subjects, 
looking at differences in GRPS. 

Receiver operating characteristic curves. Receiver operating characteristic 
curves were plotted using IBM SPSS Statistics 21. Diagnosis was converted 
to a binary call of 0 (control) or 1 (alcohol-dependent or abuser) and 
entered as the state variable, with calculated GRPS entered as the test 
variable (Supplementary Figure S2). 



RESULTS 

Top candidate genes 

To minimize false-negatives, we initially cast a wide net, using as a 
filter a minimal requirement for a gene to have both some GWAS 
evidence and some additional independent evidence. Thus, out of 
the 6085 genes with at least a SNP at P<0.05 in the discovery 
GWAS cohort 1, we generated a list of 3142 genes that also had 
some additional line of evidence (human or animal model data), 
implicating them in alcoholism (CFG score >2.5 (>2 internal) 
-F(>0.5 external)). This suggests, using these minimal thresholds 
and requirements, that the repertoire of genes potentially 
involved directly or indirectly in alcohol consumption and 
alcoholism may be quite large, similar to what we have previously 
seen for bipolar disorder^^ and schizophrenia.^^ To minimize false- 
positives, we used an internal score based on percent of SNPs in a 
gene that were nominally significant, with 4 points for those in the 
top 0.1% of the distribution (n = 77), 3 points for those in the top 
5% of the distribution (n = 561) and 2 points for the rest of the 
nominally significant SNPs (n = 5447). We then used the CFG 
analysis and scoring integrating multiple lines of evidence to 
prioritize this list of genes (Figure 1) and focused our subsequent 
analyses on only the top CFG scoring candidate genes. Overall, 
135 genes had a CFG score of 8 and above (>50% of maximum 
possible score of 16). 

Of note, there was no correlation between CFG prioritization 
and gene size, thus excluding a gene-size effect for the observed 
enrichment (Supplementary Figure SI). 
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Biological pathways and drug targets 

Pathway analyses were carried out on the top candidate genes 
(Table 2). Notably, Gai signaling, cocaine addiction and transmis- 
sion of nerve impulses were the top biological pathways in 
alcoholism, which may be informative for treatments and drug 
discovery efforts by pharmaceutical companies. Of note, these top 
candidate genes were identified and prioritized only for evidence 
for alcoholism before pathway analyses; therefore, the overlap 
with cocaine addiction is a completely independent result, 
suggesting a shared drive and neurobiology. Consistent with that, 
two of our 135 top candidate genes for alcoholism (CPE and VWF) 
had SNPs with P< 10"^ in a recent GWAS of cocaine addiction.^^ 

Some of the top alcohol candidate genes have prior evidence of 
being modulated by the omega-3 fatty acid DHA in our DBP 
mouse animal model (Table 3 and Supplementary Table S1). That 
is of particular interest, as we have previously shown that 
treatment with the omega-3 fatty acid DHA decreased alcohol 
consumption in that animal model, as well as in another 
independent animal model, the alcohol-preferring P rats.^ 
Omega-3 fatty acids, particularly DHA, have been described to 
have alcoholism, mood, psychosis and suicide-modulating proper- 
ties, in preclinical models as well as some human clinical trials and 
epidemiological studies. For example, deficits in omega-3 fatty 
acids have been linked to increased depression and aggression in 
animal models^"^'^^ and humans.^^'^^ DHA prevents ethanol 
damage in vitro in rat hippocampal slices.^^ Omega-3 supple- 
mentation can prevent oxidative damage caused by prenatal 
alcohol exposure in rats.^^ Of note, deficits in DHA have been 
reported in erythrocytes^^ and in the post-mortem orbitofrontal 
cortex of patients with bipolar disorder, and were greater in those 
that had high versus those that had low alcohol abuse.''^ Low DHA 
levels may be a risk factor for suicide.^^'^^ Omega-3 fatty acids 
have been reported to be clinically useful in the treatment of both 
mood^"^"^^ and psychotic disorders.^^"^° 

Other existing pharmacological drugs that modulate alcohol 
candidate genes identified by us include, besides benzodiaze- 
pines, dopaminergic agents, glutamatergic agents, serotonergic 
agents, as well as statins (Supplementary Table S3). 

Genetic risk prediction score 

Once the genes involved in a disorder are identified, and 
prioritized for likelihood of involvement, then an obvious next 
step is developing a way of applying that knowledge to genetic 
testing of individuals to determine risk for the disorder. On the 
basis of our identification of top candidate genes described above 
using CFG, we pursued a polygenic panel approach, with digitized 
binary scoring for presence or absence, similar to the one we have 
devised and used in the past for biomarkers testing^^'^^ and for 
genetic testing in bipolar disorder^^ and schizophrenia.^"^ Some- 
what similar approaches but without CFG prioritization, attempted 
by other groups, have been either unsuccessful^^ or have required 
very large panels of markers.^^ 

We chose all the nominally significant P-value SNPs (P < 0.05) in 
each of our top CFG-prioritized genes (n = 135 with CFG score >8; 
Supplementary Table SI) in the GWAS1 data set used for 
discovery, and assembled a GRPS-135 panel out of those SNPs 
(Table 4). We then tested the GRPS-135 in the independent 
German test cohort 2, based on the presence or absence of the 
alleles of the SNPs associated with the illness, comparing the 
alcoholic subjects to controls (Table 4), and showed that, although 
there was a trend, we were not able to distinguish alcoholics from 
controls in both independent test cohorts. 

We then prioritized a smaller panel of 1 1 genes (Table 3) out of 
this larger panel, by using as a cross-validator the top genes from 
a stress-reactive mouse animal model for alcoholism, the DBP 
knockout mouse^ (Figure 4). The small panel (GRPS-11) showed 
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more robust results than the larger panel (Table 4), suggesting 
that it captures the key behaviorally relevant genes. 

DISCUSSION 

Our CFG approach helped to prioritize a very rich-in-signal and 
biologically interesting set of genes (Table 3 and Supplementary 
Table SI). Some, such as SNCA, CPE, DRD2 and GRIV13, have weaker 
evidence based on the GWAS data but strong independent 
evidence in terms of gene expression studies and other prior 
human or animal genetic work. Conversely, some of the top 
previous genetic findings in the field,^"^ such as ADHIC^^ (CFG 
score of 9), GABRA2^^ (CFG score of 8), as well as AUTS2 (CFG score 
of 7), CHRIV12 and KCNJ6 (CFG scores of 4) have fewer different 
independent lines of evidence, and thus received a lower CFG 
prioritization score in our analysis (Supplementary Table SI), 
although they are clearly involved in alcoholism-related processes. 
Whereas we cannot exclude that more recently discovered genes 
have had less hypothesis-driven work performed and thus might 
score lower on CFG, it is to be noted that the CFG approach 
integrates predominantly non-hypothesis-driven, discovery-type 
data sets, such as GWAS data, linkage, quantitative traits loci and, 
particularly, gene expression. We also cap each line of evidence 
from an experimental approach (Figure 1), to minimize any 
'popularity' bias, whereas multiple studies of the same kind are 
conducted on better-established genes. In the end, it is gene-level 
reproducibility across multiple approaches and platforms that is 
built into the approach and gets prioritized most by CFG scoring 
during the discovery process. Our top results subsequently show 
good reproducibility and predictive ability in independent cohort 
testing, the litmus test for any such work. 

At the very top of our list of candidate genes for alcoholism, 
with a CFG score of 13, we have SNCA, a pre-synaptic chaperone 
that has been reported to be involved in modulating brain 
plasticity and neurogenesis, as well as neurotransmission, 
primarily as a brake.^^'^^ On the pathological side, low levels of 
SNCA might offer less protection against oxidative stress,^^ 
whereas high levels of SNCA may have a role in neurodegenera- 
tive diseases, including in Parkinson disease. SNCA has been 
identified as a susceptibility gene for alcohol cravings^ and 
response to alcohol cues.^° The evidence provided by our data 
and other previous human genetic association studies suggest a 
genetic rather than purely environmental (alcohol consumption 
and stress) basis for its alteration in disease, and its potential utility 
as trait rather than purely state marker. 

Alcoholics carry a genetic variant that leads to reduced baseline 
expression of SNCA.^ SNCA is also downregulated in expression in 
the frontal cortex and caudate-putamen of inbred alcohol- 
preferring rats,^^ as well as in the brain (amygdala) and blood of 
our stress-reactive DBP animal model of alcoholism, before 
exposure to any alcohol. SNCA is upregulated in expression in 
blood in human alcoholism,^^'^^ as well as in the blood of 
monkeys consuming alcohol, and in rats after alcohol administra- 
tion.^ Thus, it may serve as a blood biomarker. Overall, we may 
infer that, whereas low levels of SNCA may predispose to cravings 
for alcohol and consequent alcoholism, possibly mediated 
through increased neurobiological activity and drive {tlie SNCA 
deficit hypotinesis), excessive alcohol consumption then increases 
SNCA expression beyond that seen in non-alcohol-consuming 
controls, potentially compounding risk for neurodegenerative 
diseases in individuals that have mutations that lead to its 
aggregation. This observation is also biologically consistent with 
the fact that dementia is often observed late in the course of 
alcohol dependence. 

GFAP (glial fibrillary acidic protein), a top candidate gene with a 
CFG score of 9.5, is an astrocyte intermediate filament-type 
protein involved in neuron-astrocyte interactions, cell adhesion, 
process formation and cell-cell communication. It is decreased in 
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Table 4. Genetic Risk Prediction Score (GRPS)-Panels fronn Discovery Cohort 1 
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Test in cohort 4 




alcohol abuse versus control 


GRPS-n, 


P = 0.0094 


top animal model (DBP mouse) prioritized genes 


(10 genes, 34 SNPs present) 


out of genes with CFG score of >8 




all nominally significant SNPs in each gene (n = 66) 




GRPS-SNCA, 


P = 0.023 


top CFG gene 


(1 gene, 1 SNP rsl 701 5888 present) 


all nominally significant SNPs in it (n = 4) 




Abbreviations: CFG, Convergent Functional Genomics; DBP, DNA-box-binding protein; SNCA, synuclein alpha; SNP, single-nucleotide length polymorphism. 
Differentiation between alcoholics and controls in three independent test cohorts using, GRPS-135, a panel composed of all the nominally significant SNPs 
from GWAS1 in the top candidate genes prioritized by CFG; GRPS-1 1, a panel additionally prioritized by a stress-reactive animal model for alcoholism, the DBP 
KO-stressed mouse; and GRPS-SNCA, the top candidate gene from our analyses. P-values depict one-tailed f-test results between alcoholics and controls. 



expression in post-mortenn brain of alcoholics, but increased in 
expression in brains of aninnal nnodels of predisposition to 
alcoholisnn, before exposure to alcohol (Table 3). This is consistent 
with a nnodel for increased physiological robustness in individuals 
predisposed to alcoholisnn,^ as well as with the neurodegenerative 
consequences of protracted alcohol use. 

DRD2 (dopannine receptor D2), another top candidate gene with 
a CFG score of 9, has prior hunnan genetic association evidence. It 
is reduced in expression in the frontal cortex in the hunnan brain 
fronn alcoholics, as well as in the DBP aninnal nnodel before any 
exposure to alcohol. One possible interpretation would be that 
lower levels of dopannine receptors are associated with reduced 
dopaminergic signaling and anhedonia, leading individuals to 
overconnpensate by alcohol and drug abuse. Another interpreta- 
tion, consistent with the low SNCA and consequently higher 
neurotransnnitter (including dopannine) levels, would be that these 
individuals are in fact in a compulsive, hyperdopaminergic state, 
which drives them to hedonic activities and leads to compensa- 
tory homeostatic downregulation of their DRD2 receptors. 
Consistent with this later scenario, mice that have a constitutive 
knockout of their DRD2 receptors, not because of a hyperdopa- 
minergic state, in fact consume less alcohol,^^ unless they are 
exposed to stress.^^ 

Another top candidate gene, GRM3, is also involved in 
neurotransmitter signaling. Prior evidence in the field had 
implicated another metabotropic glutamate receptor, GRIV12.^^ 

Other top candidate genes in the panel {MOBP, MBP and MOG) 
are involved in myelination (Table 3). They are decreased in 
expression in the prefrontal cortex of human alcoholics, as well as 
in our stress-reactive DBP animal model of alcoholism, before 
exposure to any alcohol. Decreased myelination may lead to 
decreased connectivity. Interestingly, MOBP and MBP are 
increased in expression in the amygdala in the DBP mice, opposite 
to the direction of change in the PFC, consistent with a frontal 



deactivation and a limbic hyperactivity, which could lead to 
impulsivity. 

Epistasis testing of top candidate genes for alcoholism 
For the top 11 candidate genes, best P-value SNPs from GWAS1 
were used to test for gene-gene interactions in GWAS2 
(Supplementary Table S5). Nominally significant interactions were 
found between SNPs in SNCA and RXRG, DRD2 and SYT1, MOBP 
and TIMP2. As a caveat, the P-value was not corrected for multiple 
comparisons. The corresponding genes merit future follow-up 
work to elucidate the biological and pathophysiological relevance 
of their interactions. 

Pathways and mechanisms 

Our pathway analysis (Table 2 and Supplementary Table S2) 
results are consistent with the accumulating evidence about the 
role of neuronal excitability and signaling in alcoholism.^^'^^'^"^ 

Overlap with other psychiatric disorders 

Despite using lines of evidence for our CFG approach that have to 
do only with alcoholism, the list of genes identified has a notable 
overlap at a pathway analysis level (Table 2B and Supplementary 
Table S2B) and at a gene level (Figures 4 and 5) with other 
psychiatric disorders. This is a topic of major interest and debate in 
the field. We demonstrate an overlap between top candidate 
genes for alcoholism and top candidate genes for schizophrenia, 
anxiety and bipolar disorder, previously identified by us through 
CFG (Figure 4), thus providing a possilDle molecular basis for the 
frequently observed clinical comorbidity and interdependence 
between alcoholism and those other major psychiatric disorders, 
as well as cross-utility of pharmacological agents. Moreover, we 
tested in alcoholics genetic risk predictive panels for bipolar 
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Table 5. Individual top genes and genetic risk prediction in independent cohorts 
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Table. 5. (Continued) 
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Abbreviations: GRPS, Genetic Risk Prediction Score; SNCA, synuclein alpha; SNP, single-nucleotide length polymorphism. Italic, nominally significant; bold italic, 
survived Bonferroni correction. 
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versus controls (Figure 6), beyond the overlap in genes with before alcohol use. These results led us to develop a heuristic, 

alcohol. There seems to be an increased genetic load for bipolar testable model of alcoholism (Figure 5). Some people may drink to 
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be calm, mitigating the effects of stress and anxiety, some people 
may drink to be happy, the common drive with bipolar disorder, 
and some people may drink to be drunk, to disconnect from 
reality and/or get unstuck from internal obsessions and 
ruminations. 

Genetic risk prediction 

Of note, our SNP panels and choice of affected alleles were based 
solely on analysis of the discovery GWAS, completely indepen- 
dently from the test cohorts. Our results show that a relatively 
limited and well-defined panel of SNPs identified based on our 
CFG analysis could differentiate between alcoholism subjects and 
controls in three independent cohorts. The fact that our genetic 
testing worked for both alcohol dependence and alcohol abuse 
suggests that these two diagnostic categories are actually 
overlapping, supporting the DSM-V reclassification of a single 
category of alcohol use disorders. 

Reproducibility among studies 

Our work provides striking evidence for the advantages, 
reproducibility and consistency of gene-level analyses of data, as 
opposed to SNP level analyses, pointing to the fundamental issue 
of genetic heterogeneity at a SNP level. In fact, it may be that the 
more biologically important a gene is for higher mental functions, 
the more heterogeneity it has at a SNP level and the more 
evolutionary divergence, for adaptive reasons. On top of that, CFG 
provides a way to prioritize genes based on disease relevance, not 
study-specific effects (that is, fit-to-disease as opposed to fit-to- 
cohort). Reproducibility of findings across different studies, 
experimental paradigms and technical platforms is deemed more 
important (and scored as such by CFG) than the strength of 
finding in an individual study (for example, P-value in a GWAS). 

Potential limitations and confounds 

The GWAS study (cohort 1) on which our discovery was based 
contained males as probands but contained males and females as 
controls. This was the case for the German test cohort (cohort 2) as 
well. It is possible that some of the nominally significant SNPs 
identified in the discovery GWAS have to do with gender 
differences rather than to alcoholism per se, or at least may have 
to do with male alcoholism. Stratification across gender and 
ethnicities may also be a factor in our test cohorts 3 and 4 
(Table 1). The issue of possible ethnicity differences in alleles, 
genes and the consequent neurobiology may need to be explored 
more in the future, with larger sample sizes, and with environ- 
mental and cultural factors taken into account. However, the use 
of a CFG approach using evidence from other studies of 
alcoholism, including animal model studies, to prioritize the 
findings decreases the likelihood that our final top results are 
ethnicity- or gender-related. Of note, our GRPS predictions 
separate alcoholics from controls in independent test cohorts, in 
both genders, and in fact work even better at separating female 
alcoholics from female controls (Figure 3). Moreover, a series of 
individual genes from the panel, not just SNCA, separates 
alcoholics from controls in independent cohorts (Table 5). 

The conversion from SNPs to genes as part of our discovery 
assumed the rule of proximity — that is, an intragenic SNP 
implicates the gene inside which it falls, or if it falls into an 
intergenic region, it implicates the most proximal gene to it. That 
may not be true in reality in all cases, generating potentially false- 
positives and false-negatives. However, the convergent approach 
and focus on the top CFG scoring genes reduce the likelihood of 
false-positives. 

The only SNP for SNCA that was present/tested for in cohorts 3 
and 4 (rsl 701 5888) was relatively far away upstream (0.13 MB) 
from SNCA. However, no other known genes are present in that 
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region, SNCA is the closest gene, and the distance is well within 
the range of known examples of regulatory regions (enhancers). In 
addition, the risk allele for this SNP (G/G) seems to be the major 
variant in the population (Supplementary Table S6), suggesting 
that this allele per se is evolutionarily advantageous, when not 
coupled with the exogenous ingestion of alcohol. 

A relatively large list of genes (n = 6085) was implicated by 
nominally significant SNPs from the discovery GWAS. There is a 
risk that out of such a large list CFG will find something to 
prioritize. We have tried to mitigate that by developing an internal 
score for each gene based on the proportion of SNPs tested in a 
gene that were nominally significant. Moreover, in the end, we 
tested the reproducibility and predictive ability of our top findings 
in multiple independent cohorts, which is the ultimate litmus test 
for any genetic or biomarker study. 

CONCLUSION 

Overall, whereas multiple mechanistic entry points may contribute 
to alcoholism pathogenesis, it is likely at its core a disease of an 
exogenous agent (alcohol) modulating different mind domains/ 
dimensions (anxiety, mood and cognition),^^ precipitated by 
environmental stress on a background of genetic vulnerability 
(Figure 5). The degree to which various mind domains/dimensions 
are affected in different individuals is a fertile area for future 
research into subtypes of alcoholism and lends itself to 
personalization of diagnosis and treatment, by integrating genetic 
data, blood gene expression biomarker data and clinical data. 
Lastly, it is important to note that individuals with a predisposition 
to alcoholism but no exposure to alcohol may in fact have a robust 
physiology and strong neurobiological drive that can be 
harnessed for other, more productive endeavors. 
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